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19 ABSTRACT (Continue on reverse if necessary and identify by block number) 

This is the final report of a series of experiments designed to study impasses in the 
learning of skills with a strong perceptual component. Several series of experiments 
were designed with the purpose of producing experimentally manipulable impasses or 
plateaus in the course of learning. Subjects in learning studies identified targets in 
various complex computer-presented displays. Among the factors manipulated were com¬ 
plexity, noise, salience, biassing instructions, and the distribution of target features 
across boundaries of displays. Impasses were produced, but patterns of impasse phenomem 
were not reproduced reliably enough to support of disconfirm a theory of impasses in 
learning. 



































Abstract 


This is the final report of a series of experiments designed to study 
impasses in the learning of skills with a strong perceptual component. 
Several series of experiments were designed with the purpose of 
producing experimentally manipulable impasses or plateaus in the 
course of learning. Subjects in learning studies identified targets in 
various complex computer-presented displays. Among the factors 
manipulates were complexity, noise, salience, biassing instructions, 
and the distribution of target features across boundaries of displays. 
Impasses were produced, but patterns of impasse phenomena were not 
reproduced reliably enough to support or disconfirm a theory of 
impasses in learning. 
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The Concept of a Learning Impasse 

This project was motivated by experiences in prior work on medical 
expertise and its acquisition (Lesgold, 1984a,b: Lesgold. Rubinson et al., 
1S88). We found that medical diagnostic performance showed certain 
aspects of nonmonotone change with practice, and this led us to wonder 
whether learning could be enhanced by finding ways to avoid apparent 
plateaus and setbacks. The concept of learning plateaus has had a 
checkered history in psychology (cf. Keller. 1958). but the discussions of 
plateaus were very superficial, simply asserting that they resulted from poor 
behavioral engineering and would not occur in any sensible instructional 
setting. We felt that modem science and technology created m i ay 
circumstances in which plateaus might occur, and we wanted to gain some 
explanatory and experimental control over the phenomenon. 

Our experience with impasses in learning came from studies of 
radiological expertise (Lesgold, 1984 a,b; Lesgold. Rubinson et al., 1988) and 
especially from learning studies that we conducted near the end of the 
radiology studies. The first phenomenon we noticed occurred in studies 
using an expert-novice type of comparative paradigm. We had no real 
novices. Rather, we compared radiologists with five or more years of post¬ 
residency experience with two groups of residents having either less than two 
years of residency experience or more than two years. In those studies, we 
found that the more advanced group of residents were less successful than 
either the junior resident group or the senior staff group. While the numbers 
of subjects were small, the effects were consistent. In several cases, junior 
residents in one study were accidentally used later as senior residents in a 
second study; on the same films, they reverted from correct diagnoses earlier 
in their careers to incorrect diagnoses later. 

We also conducted a number of training studies in which we taught 
people over hundreds of trials to "diagnose" artificially generated displays that 
were similar to chest x-ray pictures and based on a more-or-less accurate 
anatomical model of the chest. In these unpublished studies, we varied the 
amount of conceptual knowledge about the chest that was provided to 
subjects, and we found that subjects taught an appropriate mental model for 
the chest and its connection with the displays took as long or longer in 
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initial learning and showed no greater transfer to displays based on 
variations in the chest "diseases" on wliich the original displays were based 
(e.g., collapsed left upper lung instead of collapsed right middle lung) than 
subjects who did not receive the conceptual training. Further, some display 
types showed no learning over long periods of training (i.e.. no movement 
above chance performance). 

After reading some of the literature on non-monotone aspects of 
development and some of the concept learning literature, it became apparent 
to us that certain aspects of modern life create opportunities to view the 
world in ways that are more subject to learning impasses than might be the 
case in a more "natural” world. Our view has been, in essence, that 
impasses occur only in cases where (a) the situation to be understood or 
recognized is extremely complex, (b) the structure of features apparent in the 
situation does not map very directly onto any model of the world that the 
learner might have, and (c) the learner has not yet acquired any direct 
organization of the microfeatures of the situation into higher-order features 
that might have such a direct mapping into his/her conceptual model 
repertoire. 

One example of such a situation is passive sonar image interpretation. 
Passive sonar images are distributions showing energy levels of different 
sound frequencies over time. The "objects" in such displays do not map 
directly onto the objects of the ocean environment. Rather, they map onto 
summations of sound producing activities. Further, each sound producing 
activity is likely to produce several unique "objects" in a distribution of 
spectral energy over time, and individual components of such "objects" may 
be closer to components of other "objects" than to each other. Accordingly, 
the potentially meaningful units according to the Gestalt rules may not be 
meaningful at all. Such situations seem likely to be artificial—based on 
some man-made artifices—rather than naturally occurring. They are not 
entirely novel, but they are certainly more common with new technologies. 
Other situations of this sort include 12-lead electrocardiograms, well logs 
from oil exploration studies, and densely-packed printed circuit and VLSI 
layouts. 

We hoped to bring the impasse phenomena produced by such 
situations under experimental control, and that was the purpose of this 




Lesgold, University of Pittsburgh 
Final Report: N00014-86-K-0361 


project. We were not entirely successful. Indeed, we asked ONR not to 
consider the optional third year for our contract, because we feel that 
significant progress must await tne development of entirely different 
experimental approaches than those we took. After performing 19 
experiments, we still find ourselves unable to demonstrate and control 
impasse phenomena adequately to meet our standards of empirical science. 
In the sections that follow, we summarize theoretical viewpoints of possible 
relevance, our many empirical studies, and our final conclusions. 


Theoretical Views of Impasses 

There are several levels at which one can view learning impasses. 
Clearly, they can be seen at the cognitive level hinted at in the discussion 
above, either fully within a theoretical stance based on mental models or 
from a developmental point of view. However, they might also be seen from 
a behavioral point of view or from a perceptual learning point of view, and 
certain aspects of these non-cognitive viewpoints seem worthy of note. 


The Behavioral View 

The conditioning literature contains references to certain cases in 
which stimulus patterns either are not conditionable to responses or else 
take a long time to become conditioned. Two related phenomena that have 
been reported are overshadowing and blocking (cf. Mackintosh. 1975). Both 
refer to situations in which one stimulus which is correlated with another 
cannot be conditioned to a response. Overshadowing is a phenomenon 
originally reported by Pavlov, in which a more salient stimulus, when 
conditioned to a response, prevents the conditioning of a less salient but 
equally relevant (i.e.. predictive) stimulus to that response. For example, if a 
weak thermal stimulus is presented shortly before food is supplied, a dog will 
learn to salivate in response to that stimulus. However, if the thermal 
stimulus is always accompanied by a loud noise, only the noise will be 
conditioned. 

Blocking is a term introduced by Kamin (1969) in which conditioning 
one stimulus to a response prevents later conditioning of a second element 
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after both are presented together. For example, if light is used to signal a 
shock and then later light and noise together signal the coming shock, the 
noise alone will not come to elicit any shock-related response. This 
phenomenon is similar to one seen in some of our experiments on voice 
spectrogram recognition described below. 

Mackintosh (1975) suggested that a stimulus will be conditioned to the 
extent that it signals a change from what could have been predicted without 
it. Further, he theorized, stimuli that have no marginal predictive power 
become less conditionable. To the extent that a stimulus's predictive power 
is. or appears to the subject to be, stochastic, a change in predictive power 
will take time to notice. Hence, if Mackintosh is correct, a stimulus without 
predictive power that becomes predictive will initially suffer a period of slow 
learning because of the compounding of the partial reinforcement effect and 
the initially lower learning rate due to historically being low in marginal 
predictive capability. 


The Feature Sampling View 

The behavioral data just reviewed may seem of minimal relevance to 
impasses in cognitive learning, but it does prompt us to notice several 
aspects of the impasse situations we have examined and to better 
understand how those situations deviate from experimental paradigms that 
liave bee** employed in studying plateau® and impasses. Concept learning 
experiments tend to use relatively simple displays. The most common type of 
experiment uses displays in which there are a small number of dimensions 
varied, each involving a small number of display features, e.g., single vs. 
double borders, square vs. triangle, one vs. two central forms, red vs. blue, 
etc. A second type of display form that has been used in experimental work 
is the random deviation from a prototype. The so-called Attneave Figure is 
such a form. To define each prototype, a set of randomly plotted points is 
connected to create a polygon. Instances of the prototype are created by 
introducing small random perturbations of the exact locations of the vertex 
points. Three instances of the same prototype are shown in Figure 1 below. 

Attneave figures and the simple displays of concept learning 
experiments can be contrasted with the much more complex displays that 
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were the target of this project, passive sonar displays, voice spectrograms, 
and the like. In the figures that have been used for experimental work, the 
features that might play a role in defining categories are relatively evident. 
In contrast, the meaningful features of the noisy artificial displays in which 
we were interested are very difficult to isolate. Sometimes, critical features or 
feature relationships are never noticed over the course of several hours of 
experimentation. In this respect, standard methodologies of concept learning, 
which look at the relative speed at which different kinds of concepts are 
acquired, and perceptual learning experiments which look at the relative 
speed at which different display types come to be recognized, were not suited 
to our goals. As will be seen below, when we used realistic stimuli, many 
subjects failed ever to learn what to notice. When we used simpler stimuli, 
we failed to. get impasse effects. 

The time needed to discover which features are relevant in a 
perceptual recognition learning task is an important measure. For example. 
Zeaman and House (1963- see also Fisher & Zeaman, 1973) found that 
retardates differed from normal subjects in how long it took them to notice 
relevant stimulus features. Once features were noticed by retardates, their 
improvement curves looked about the same as those for normal subjects. 
This motivates an experimental paradigm in which trials until learning starts 
to be evident is a basic measure. However, with the materials in which we 
were interested, such experiments proved impossible to run successfully. In 
order to be practical and yet of sufficient power, the experiments required 
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within-subject manipulations. However, when learning failed to occur at all 
for some _ ses. these within-subject studies were not entirely conclusive. 

The difficulty problem makes it impossible to clearly separate two 
important potential causes of perceptual learning impasses. One is inability 
to notice critical features, as just discussed. A second, and one that we 
think is important (see the discussions below of our artificial voice 
spectrogram studies) is whether critical feature combinations consist of 
features that are all within the same meaningful region of a display or not. 
As a specific example, consider the case of voice spectrograms for syllables. 
In such displays, it is possible, and obviously meaningful, to parse the 
display into segments corresponding to individual phonemes. The display 
plots time on the x axis against frequency on the y axis, and it makes sense 
to split up the total time into the periods in which each of the phonemes of a 
syllable were uttered. However, since it also takes time for the speech 
apparatus to reconfigure from one phoneme to the next, some of the cues for 
identifying one phoneme are to be found in the features of the phoneme 
immediately before or after. For example, distinguishing /d/ from /g/ is 
generally difficult to impossible without examination of the features of the 
vowel that follows (as in dig vs. gig). 

This is an example of the general problem, cited above, in which the 
apparent spatial components of a display do not map well onto the 
components of the events that gave rise to the display. Unfortunately, we 
failed to gain control over this kind of situation. While some of our final 
experiments demonstrate weakly that such a problem is significant, we could 
not control its emergence well enough to permit the kinds of instructional 
studies we wanted to carry out. This outcome is particularly discouraging 
because better theoretical apparatus is being developed for understanding 
how people come to discover the feature clusters that are relevant to a 
learning task. For example. Billman and Heit (1988) have simulated the 
effects of some very general, or weak, metacognitive methods of focused 
sampling of potential rules for mapping features and feature combinations 
onto categories, a significant step beyond the simple formulations of Zeaman 
(French & Zeaman. 1973: Zeaman & House. 1963). 
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The Developmental View 

The developmental literature also provides quite a bit of theoretical 
power for dealing with learning impasses. Again, the problem is that we 
could not gain adequate experimental control to apply current theory. Stage 
theories of cognitive development are inherently theories of impasse, asserting 
that certain learning, possible at later stages of development, cannot occur 
earlier. In fact, the developmental literature is replete with examples of non¬ 
monotone learning curves, situations in which performance suffers setbacks, 
in terms of some fixed criterion, over the course of practice (Bowerman. 
’982; Karmiloff-Smith, 1979; Karmiloff-Smith & Inhelder, 1974/1975; Klahr. 
1982; Richards & Siegler, 1982; Stavy, Strauss, Orpaz. & Carmi. 1982: 
Strauss & Stavy, 1982). In fact. Strauss & Stavy (1982) listed five kinds of 
nonmonotone performance possibilities: 

1. Movement from a practiced but inadequate mental 
representation of a task situation to a more powerful but less- 
well-practiced representation. 

2. Uncoordinated combination of two different mental 
representation systems. 

3. Using newly-learned rules tnat are correct for one 
situation in apparently related situations for which they are 
incorrect. 

4. Having lower-order rules to deal with each of two task 
variables but not having the higher-order rules to coordinate 
these lower-order rules. 

5. Having problems adapting a newly-acquired weak 
method to a specific situation for which a more domain-specific 
strong method must be evolved before the new metacognitive 
knowledge can be effective. 

We believe that the problems faced by people trying to learn to 
recognize displays like passive sonar images and voice spectrograms do 
indeed involve mental representation inadequacies, but they are perhaps of a 
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slightly different character than has been examined in the developmental 
literature. The problem appears to be that in order to quickly apprehend 
these artificial displays, one must be able to recognize complex features that 
are not physically clustered according to the Gestalt laws (e.g., the features 
close together may not be related and ones far apart might be closely 
related). Generally, in order to handle such situations, one needs to be able 
to recognize the relevant lower-order features, to know parsing rules for 
sorting out which lower-order features cluster together, and to understand 
the meaning of the clusters. 

This is not something that people are good at, in general. After all. 
the case of speech perception is remarkably similar. The superficial 
clustering, in terms of bursts of sound, for spoken language does not match 
word boundaries veiy well (e.g., goo/d eve/ning or a/llon/s en/fanLs del la 
pa/tri/e). Rather, we become highly practiced at matching these sound 
patterns to representations of the concepts to which they refer, even though 
that requires a highly specialized parsing. This parsing ability does not arise 
without extensive practice. Even moving from one language to another 
requires substantial practice. Further, in the speech understanding case, 
our own experience tells us that the study of vocabulary and grammar do 
not, themselves, permit understanding of the spoken word—one has to 
practice conversations extensively to learn to understand a new language as 
spoken. Prior reading knowledge certainly helps, but only to a point. 

The time course of such practice makes it very difficult to conduct 
learning studies. As a result, much of developmental psychology involves 
comparisons of performance of different people selected from different points 
in the leaming/development curve. Further, extensive interactions and 
verbal thinking-aloud protocols are often used. This is sufficient for 
characterizing the course of development, but it does not admit readily the 
possibility of studying systematically varied experience tracks. Small 
amounts of comparative ethnographic work have been done, but for the most 
part developmental methods are insufficient for studying the effects of 
various training interventions. 

Nonetheless, we had hoped to use such methodologies as an adjunct 
to our experimental manipulations. Indeed, in some of the studies reported 
below, we did take protocols in order to better understand how subjects were 
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trying to learn to recognize various patterns. However, our failure to 
predictably generate impasse effects in experimentally tractable ways kept us 
from pursuing the developmental approach very far. We did, however, get 
some sense in a few of our studies of the ways in which subjects were trying 
to sort out what they were seeing and therefore of the mental models that 
they had for the domains we used. 

Summary of Experimental Efforts 

Since the fall of 1986, a total of 19 experiments were designed in 
which at least one subject was run. Because the experiments used displays 
generated by complex rules, all of the experiments were conducted on Xerox 
artificial intelligence workstations. The programs used to generate the 
displays and to conduct the experiments are available from the authors and 
will be sent without charge to anyone on the ONR Cognitive Science mailing 
list who requests them. The following is a summary of these experiments 
and their results. Individual reports of the experiments give more detailed 
descriptions of the experiments (see "Available Software and Data"). 

Our first attempts to produce reliable and experimentally tractable 
impasses used extremely noisy displays of known object form classes, such 
as animals and airplanes. We chose these displays in the hope that this 
would allow us to keep the tasks simple enough to fit standard experimental 
paradigms and time constraints. We then tried using displays that 
resembled the segmented digits used on LCD watches. Finally, we conducted 
an extensive series of studies using artificially created displays that 
resembled voice spectrograms. 


Lost Plane Experiments: September 1986 - December 1986 

Two experiments were conducted in which subjects studied three 
different drawings of military planes and then were given a series of visual 
search trials in which they were to identify the plane that appeared on the 
screen and its directional orientation (the latter a control for guessing). The 
planes were obscured by a moderate amount of random line noise (lines or 
curves of random length and orientation) and randomly strewn plane parts 
(wings or tails). The two versions of the experiment, called Easy Planes and 
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Hard Planes, differed only in the amount of random line noise used. Figures 
2 and 3 show examples of an easy and a hard case. 

Method . There were three different plane silhouettes, and the task 
was to learn to identify which plane was hidden in the display. The 
manipulated variables for the experiments were the Plane Identity (A, B, or 
C). the Orientation of the plane (8 compass values), and the type of Plane 
Parts used as masking noise (either wings from Plane A. or tails from plane 
C). Combinations of these variables produced 48 different pictures which 
were presented to the subject in 4 blocks of 12 trials. Twenty subjects 
participated in the Easy Planes experiment, and six participated in the Hard 
Planes experiment. 
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Figure 3. Hard plane facing northeast with tail noise. 


Results . Because our focus was on reliably generating learning 
impasses.we could not fully control all variables. Specifically, the design of 
the experiments unsystematically confounded Orientation with .Learning 
Block. Hence, a full factorial analysis could not be performed. This should 
be kept in mind when considering the following results. For the Easy Planes 
experiment, mean proportion correct over learning blocks increased linearly 
from 0.55 to 0.92 while response time decreased linearly from 33.82 seconds 
to 16.43 seconds. There were no systematic learning differences for the 
different Plane Identities or Parts Masks. For the Hard Planes experiment, 
mean proportion correct increased linearly from 0.44 to 0.79 over learning 
blocks as response time decreased from 55.27 to 41.45 seconds. Again no 
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systematic learning differences were observed for either Plane Identity or 
Parts Mask type. No learning impasses were observed. 


Lost Animal Experiments: November 1986 - October 1987 

The lost animals experiments were similar in principle to the lost 
planes experiments. Generally, subjects were shown outline drawings of five 
animals to study, and were then presented with several visual search trials 
where they were to identify an animal and specify its orientation. Altogether, 
seven lost animals experiments were conducted. These included 
manipulations of noise type (Easy Animals and Hard Animals), tarnipulation 
of the subject’s advance knowledge of the animal shapes and identities (Free 
Response Animals), extended practice on the difficult animals task by the 
experimenters (Extended Animals, and Nanimals). and comparison of learning 
ability with parts masks which were inward projecting, where the parts could 
belong to animals within the picture, or outward projecting, where the parts 
could not belong to animals within the picture (Reversed Animals and 
Within Animals). 


Ekisy Animals and Hard Animals Experiments 

The Easy and Hard Animals experiments were basically the same in 
design as the Lost Plane experiments. Subjects viewed five outline drawings 
of animals and then performed a visual search task where they specified 
which animal was depicted and which orientation it faced. In the Easy 
Animals Experiment, the animals were shown with one of two types of 
random line noise: either straight lines or curved lines. In the Hard Animals 
experiment, the random line noise was augmented with-a mask made up of 
animal parts (e.g., kangaroo tail, elephant trunk, etc.). 

Method . The manipulated variables were Animal Identity (Penguin. 
Camel, Rhinoceros, Kangaroo, Elephant), Orientation (four primary compass 
values), and Noise Type (straight or curved). Combinations of these variables 
produced 40 different pictures which were shown to subjects in blocks of 10 
trials. Sixteen subjects participated in each of the Easy and Hard Animals 
experiments, but no subject participated in both experiments. 
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Results . As was the case fo* the Lost Planes experiments, the Lost 
Animals experiments also unsystematically confounded Orientation with 
Learning Block. Hence, no full factorial analysis was possible. Keeping this 
in mind, the mean proportion correct for the Easy Animals experiment 
increased slightly with learning block. The values range from 0.80 to 0.89. 
At the same time, response time decreased from 15.76 seconds to 9.66 
seconds with learning block. So, again there were no reliable impasse 
effects. No systematic learning differences between animals were found, but 
animals disguised in straight line noise were more often detected than 
animals disguised in curved noise. Straight line noise accuracy was at 
ceiling on all four learning blocks, but Curved line noise accuracy appeared 
to improve from 0.67 to 0.84. 

The results for the Hard Animals experiment were that subjects 
performed only slightly above chance during the experiment and never 
improved (0.10 on block 1 to 0.11 on block 4; chance was 0.05). Subjects 
were only slightly more accurate on animals masked by straight line noise 
(0.13) than on animals masked by curved line noise (0.09). It was this 
finding of an apparent impasse that kept us persisting with the animal 
detection studies. 


Extended Practice Animals and Nanimals Experiments 

To discover whether the Hard Animals task could be learned, the 
experimenters performed the task over several sessions. In the Extended 
Practice experiment, two experimenters (MM and GG) familiar with the task 
performed it 8 times. In the Nanimals experiment, an experimenter (JT) 
unfamiliar with the task performed it 20 times. In this latter experiment, 
different parts masks were used on each trial to prevent improvement due to 
learning the position of the distractors. 

Method . The experiment was the standard Hard Animals experiment 
described above. For the Nanimals experiment, the animal parts mask was 
changed on each problem to prevent the position of the distractors from 
being learned. However, the same set of masks were used on each session. 
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Results . Again, no factorial analysis of the results will be presented, 
but overall improvement in accuracy and response time was found. That is, 
given adequate practice, learning occurred continuously without impasse. 
For the Extended Practice experiment, one subject (GG) began with ceiling 
accuracy and decreased in response time from a mean of 27.72 seconds on 
the first block of the first session to a mean of 4.83 seconds on the final 
block of the 8th session. The other subject (MM) reached ceiling accuracy on 
the second session and decreased in response time from a mean of 67.42 
seconds on the first block of the second session to 11.91 seconds oh the last 
block of the 8th session. 

For the Nanimals experiment, the subject (JT) achieved an accuracy of 
0.10 on the first session (comparable to the performance of subjects in the 
Hard Animals experiment) and reached ceiling accuracy by about the 7th 
session. From this point, response time decreased from 26.34 seconds on 
the first block of the 7th session to 9.80 seconds on the final block of the 
20th session. Again, the basic finding is that the task, too difficult for the 
time constraints of ordinaiy laboratory experimentation, showed no real 
impasses when adequate training time was given. 


Reversed Animals and Within Animals Experiments 

Even though continuous learning took place if enough trials were 
given, the hard animals tasks could, on the right time scale, be seen as 
involving impasses in learning, at least for the less-motivated subjects we 
recruited (relative to our own staff in the extended studies). So, we tried to 
find controlled means for making the difficulty of the hard animals conditions 
come and go. These experiments examined whether the search difficulty 
created by the animal parts mask (as was found in the Hard Animals 
experiment) was due to subjects being misled into examining the parts 
contained in the mask. The parts mask used by the Hard Animals 
experiment located animal parts so that if the rest of the animal were 
attached to the part, the whole animal would appear within the stimulus 
picture. For this reason, the mask was called "inward projecting." A second 
mask was designed which located the same parts so that if the rest of the 
animal were attached to the part, most of the animal would be located 
outside of the stimulus picture. This second mask was called "outward 
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projecting." The reasoning behind the experiments was that if subjects were 
testing part hypotheses during their search, they should be more disrupted 
by the inward projecting mask, whose parts they would have to test, than by 
the outward projecting mask, whose parts they should be able to quickly 
reject as potential targets. The two experiments differ in that the Reversed 
Animals experiment uses a between-subject design while the Within Animals 
experiment uses a within-subject design. 

Method . For the Reversed Animals experiment, eight subjects were 
run in the standard Hard Animals experiment (to establish continuity with 
the previous experiment for this subject group) which used the inward 
projecting mask. Sixteen subjects were run in the same task except that the 
outward projecting mask was used in place of the inward projecting one. For 
the Within Animals experiment, the straight and curved line noise masks 
were replaced with a single mask which combined half straight and half 
curved noise. Subjects then saw the all of the animal patterns once with the 
inward projecting mask and once with the outward projecting mask. 

Results . The results of the Reversed Animals experiment were that the 
subjects who searched for animals in outward projecting parts noise 
identified about twice as many animals as the original Hard Animals subjects 
(0.24 vs 0.10), but about the same as the comparison group given the Hard 
Animals task (0.23). Neither the inward nor outward projecting groups 
improved over blocks. This suggested that whatever impasses we were 
observing before were motivational and not cognitive. 

The results of the Within Animals experiment were that subjects 
responded faster to the outward projecting problems than to the inward 
projecting ones (57 seconds vs 38 seconds), but the accuracy on the two 
types of problems was the same (0.32 vs 0.38. respectively) and greater than 
chance. 


LCD Experiment: September 1987 

The LCD experiment looked at transfer of learning in a diagnostic 
reasoning task. The subjects were to diagnose a "fault" in a display 
resembling an LCD numeral display. In each problem in this series, a 
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simulated fault caused one or more segments of the seven-segment display 
either to be always on. always ofF, or reversed: of! when it should be on and 
on when it should be off. The subjects, by calling for the display of digits 
from 0 to 9. were to determine which segment(s) were affected and by which 
fault. Two transfer conditions and one control condition were used to 
determine whether learning on a more simple version of the task would 
produce negative transfer to a more complex version. 

Method . Fifteen subjects were divided into three conditions. All 
subjects participated in two experimental sessions. In the first condition, 
subjects performed a simple version of the task on the first session and then 
transferred to the full task on the second session. The simple version used 
problems which had only one affected segment, which was either always on 
or always off. In the full version of the task, problems could have either one 
or two affected segments and could be reversed, always on. or always off. In 
the second condition, subjects performed a task which was more complex 
than the simple task, but less complex than the full task, before transferring 
to the full task. In this moderately complex task, problems had only one 
affected segment, but it could be always on, always off, or reversed. On their 
second session, these subjects performed the full task. Finally, the third 
condition received the full task on both sessions. The dependent variable 
was the proportion of correct responses (both segment and disease correct). 

Results . Difference scores between proportion correct on first and 
second sessions were calculated for each subject. The mean values were - 
.108 for the first condition, -0.010 for the second condition, and 0.030 for 
the third condition. Bonferroni t-tests revealed that subjects who 
experienced the simple version of the task in the first session showed 
significant negative transfer relative to those who experienced the full task (p 
< .05) but that those experiencing the moderately complex task in the first 
session did not show significantly more negative transfer (p > .05). 


Spectrogram Learning Experiments: November 1987 - June 1989 

We shared with the ONR technical monitor the belief that the LCD 
studies were not as interesting a direction to pursue as the more perceptual 
possibilities we were considering and therefore ceased experimentation in this 
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line. The remainder of our studies used artificially produced voice 
spectrograms, displays in which time was plotted on the x axis and 
frequency on the y axis, with darkness of a position showing the amount of 
sound energy of that frequency present at that time. Figure 4 shows an 
example of the type of display that we used. 

Nine experiments were run using pseudo-speech spectrograms as 
stimuli. The first studies used a scaling methodology to try to determine 
which visual dimensions of vowel patterns naive subjects would attend to 
(Vowel Scaling experiment and Scale-Leam-Scale experiment). This was 
followed by experiments which looked at the learning of vowel patterns 



Figure 4. Example of artificial speech spectrogram. 
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(Vowel Transfer experiment), real word patterns (Real Word learning 
experiment), and finally consonant patterns (Consonant Discrimination 
experiments I, n, and III). A small experiment was also performed which 
tried to examine the influence of subjects’ conceptual understanding of 
speech on their spectrogram reading performance (Instructional Model 
experiment). 

To understand the logic of the experiments, a few facts about speech 
spectrograms are worth noting. There are two types of phonemes, vowels 
and consonants. Vowels consist primarily of sound energy clustered into 
three main frequency bands, and these bands stay at about the same 
frequency for a relatively long time. Consonants, on the other hand, tend to 
involve faster changes in frequency and somewhat less clustering around a 
small number of core frequencies, called Jonvants. This substantial 
difference in appearance makes it highly likely that even a naive viewer will 
parse a spectrogram display into regions demarcated by phoneme 
boundaries. Critically important to our design is the fact that some 
consonants are indistinguishable from one another if one looks only at the 
part of the spectrogram associated with the temporal duration of the 
consonant. Rather, these consonants must be distinguished by examining 
the effects of the lip and mouth movements they involve on either preceding 
or following vowels. In particular, /d/ and /g/ are distinguished by their 
effects on the vowel which follows them, either "pulling" the start of the 
second and third formants together to tne point of overlap or not. 

This has two effects. First, vowel displays vaiy depending on the 
consonant context in which they appear. However, there are certain aspects 
to vowel displays that are constant. These become the critical features for 
identifying vowels. For identifying consonants, on the other hand, one must 
consider not only the part of the display showing the consonant’s acoustic 
effect but also the neighboring vowel. Further, what is noise with respect to 
vowel identification is critical to neighboring consonant identification. So, 
identifying certain consonants like /d/ and /g/ requires noticing that part of 
the neighboring vowel context is relevant and. in particular, that the relevant 
part is the part that is more or less irrelevant to vowel identification. 

We expected that impasses would occur whenever perceptual learning 
tasks involved distinguishing syllables that differed in whether they began 
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with /d/ or /g/. because the needed information for deciding on the 
distinction was spread over two different regions of the display and because 
the vowel context information needed was the "noise" with respect to vowel 
identification. The series of studies we conducted included some in which we 
tried to gather baseline data on feature salience and others in which we 
looked directly for the impasse effect. 


Vowel Scaling Experiment and Scale-Leam-Scale Experiment 

The scaling experiments were, in essence, baseline studies. A 
computer program was written to generate pseudo-speech spectrogram 
patterns based on feature descriptions of real spectrograms. The first 
patterns generated were vowels in a standard form (no distorting consonant 
context, horizontal formants) and in a transformed form (curved formants as 
would result from consonants immediately before or after). To compare how 
similarly subjects would regard the transformed and the standard vowel 
formants, two scaling studies were done. In the first, subjects saw all 
pairwise combinations of 11 vowels in standard and transformed form and 
rated the similarity of each pair on a numerical scale. These values were 
entered into a multidimensional scaling analysis. In the second experiment, 
a different group of subjects made similarity Judgments on the 11 standard 
vowel patterns, then learned to distinguish the patterns, and flnallv. scaled 
the patterns again. This was done to see whether learning would change 
how subjects saw the patterns. 

Method . In the first scaling experiment, subjects scaled all pairwise 
combinations of 22 patterns (11 standard and 11 transformed for a total of 
231 pairs). Each pair appeared on a computer screen along with a scale 
ranging from 1 (not similar) to 7 (very similar). Nineteen subjects rated the 
similarity of the 231 pairs. 

In the second experiment, five subjects rated the similarity of 55 pairs 
of vowels (pairwise combinations of the 11 standard vowels), then learned to 
identify the different vowels, and finally rated them again. The rating 
procedure was the same as in the Vowel Scaling experiment. The learning 
procedure had subjects view the 11 vowels in a random order and select the 
name of the vowel from a screen menu. If the response was incorrect, the 





Lesgold, University of f’ittsburgh 
Final Report: N00014-86-K-0361 


22 


subject was given the correct name. The measure of learning was the 
number of times the subject had to go through the list before getting them 
all right. 

Results . The data were scaled using ALSCAL, a nonmetric, 

multidimensional scaling program, and INDSCAL, a related program that also 
examines differences between individual subjects’ data. For the simple 
scaling experiment, the most meaningful ALSCAL solution was found with 

three dimensions. However, the stress value of this solution was 0.267 

indicating that it was not a very good fit. Nevertheless, this solution tended 
to separate the patterns according to whether they were standard or 

transformed, whether they were low or high vowels (second formant height), 
and whether the formants were transformed by a slight bending (such as 
that which occurs when a vowel follows a bilabial stop) or by a convergence 
of the second and third formants (such as that which occurs when a vowel 
follows a velar stop). 

For the Scale-Leam-Scale experiment, the scaling of the first rating 
achieved a stress of 0.199 in three dimensions, but only two of those 
dimensions, second formant height and vowel width, were readily 
interpretable. An INDSCAL solution indicated that most of the subjects 
weighted second formant height higher than both vowel width and the 
uninterpreted third dimension. On the Learning task, subjects took an 
average of 16.4 attempts to learn the 11 vowels. After learning, the subjects 
again rated the similarity of the vowels. On this second rating, their scaling 
solution looked similar to the first one. The three dimensional solution 
achieved a stress of 0.184 and again the recognizable dimensions were 
second formant height and vowel width. An INDSCAL solution was found for 
this second scaling and a comparison of the two revealed that most subjects 
increased their weighting of second formant height and decreased their 
weighting of vowel width. This indicates that learning may have sensitized 
them to using the second formant as a basis for discrimination and thus 
caused them to become less sensitive to the information that might help in 
distinguishing a prior consonant like /d/ or /g/. 
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Vowel Transfer Experiment 

One way people might be taught to recognize vowel patterns is by 
training them on the standard vowel forms (which are never encountered 
when "reading" spectrograms of continuous speech) and expecting this 
training to transfer to the transformed cases the learner will encounter. It is 
also reasonable to expect this might not work. If subjects attend to the 
wrong aspects of the standard form, or don’t recognize the transformed vowel 
as an exemplar of the standard form, no transfer would be expected. The 
Vowel Transfer experiment was designed to see whether this expectation was 
reasonable. The experiment compared transfer from the standard vowel 
patterns to the transformed vowel patterns with transfer in the opposite 
direction. 

Method . Eight subjects were divided into two groups of four. One 
group was given the task of learning the standard vowels followed by the 
task of learning the transformed vowels. The second group r eceived the 
same tasks but in the reverse order. The learning tasks were the same as 
the one described in the Scale-Leam-Scale experiment. Subjects saw 11 
vowels one at a time in random order and learned to identify them by 
selecting their names from a screen menu. If subjects were wrong, they were 
told which answer was correct. The learning criterion was one errorless pass 
through the 11 vowels. 

Results . Subjects in the first condition, who learned the standard 
vowels first, took an average of 28 blocks to learn the first set of vowels and 
an average of 7.25 blocks to learn the second. Subjects in the second 
condition, who learned the transformed vowels first, took an average of 11.25 
blocks to learn the first task, and also took an average of 11.25 blocks to 
learn the second task. Learning to discriminate the transformed vowels was 
easier than learning to discriminate the standard vowels, likely because the 
transformed vowels are less similar to each other However, learning the 
transformed vowels first produced a savings of 16.75 blocks on learning the 
standard vowels, while learning the standard vowels first only produced a 
savings of 4 blocks on learning the transformed vowels. 
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Real Word Learning Experiment 

The Real Word Learning experiment examined the learning of English 
words made up of a stop consonant followed by a vowel followed by another 
stop consonant. A pseudo-spectrogram pattern was displayed on the screen 
and subjects were free to type in any word they chose as a response. The 
computer was programmed to detect alternate spellings of the target word 
and provided feedback when subjects made an error. 

Method . Nine subjects were shown as many words as time permitted 
in a two hour experiment session (at least 110 and as many as 160). One 
subject’s data was excluded because he was not a native English speaker. 
The subjects were free to respond with whatever word they wished, but most 
of them quickly learned the three letter nature of the patterns. The subjects' 
performance was examined by looking at the total number of correct 
phonemes in intervals of 10 trials. 

Results . The general result was that the subjects showed quick initial 
learning which appeared to level off at less than perfect performance. 
Assuming subjects quickly learned the set of possible responses from the 
feedback they were given (i.e., that there were only six possible consonants 
and six possible vowels), two subjects showed chance performance with no 
improvement. The remaining six subjects each showed either abrupt or 
gradual initial improvement which reached a plateau between 50% and 75% 
correct. Looking at how subjects performed on individual phonemes revealed 
that /b/ and postvocalic /p/ were learned fairly quickly, followed by /d/, 
/t/, and prevocalic /p/, but most subjects had difficulty learning to identify 
/k/ and /g/. What these two patterns had in common was that they were 
identical to another letter (/k/ was identical to /t/ and /g/ was identical to 
/d/) except for their effect on the adjacent vowel. Most stops cause the 
formants of an adjacent vowel to curve slightly down at the consonant-vowel 
boundary, but the velar stops /k/ and /g/ cause the second and third 
formants of the vowel to curve together and meet at the consonant-vowel 
boundary. Subjects apparently had difficulty establishing that this difference 
could signal the distinction between /d/ and /g/ or /t/ and /k/. 

To establish that full learning would eventually occur on this task (i.e., 
that subjects were not at a permanent impasse), an additional subject was 
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run for a total of seven consecutive sessions (1113 trials) and showed steady 
Initial improvement for the first two sessions which appeared to level off 
during the third and fourth sessions before resuming to ceiling performance. 
This finding suggests that although learning appeared to plateau early for the 
first group of subjects, it would likely resume improving until it reached 
ceiling. This plateau appears to be due to the difficulty distinguishing the 
/d/ patterns from the /g/ patterns and the /t/ patterns from the /k/ 
patterns. This finding inspired the Consonant Discrimination Learning 
experiments which are described below. 


Instructional Model Experiment 

The purpose of this pilot experiment was to see if we could improve 
subjects’ ability to learn to read the real word spectrograms by giving them 
information about how speech sounds are made and what components of the 
speech signal are represented in the spectrogram pattern. We looked at two 
types of knowledge: conceptual knowledge about how speech sounds are 
made, and specific cue knowledge about which spectrogram features are 
important for discriminating certain sounds. 

Method . Thirty-two subjects were divided into four groups. These 
groups were: Cue Alone, Model Alone, Separate Model and Cue, and 
Integrated Model and Cue. The groups differed according to the verbal 
instructions given to the subjects. In the Cue Alone condition, subjects were 
shown a table which distinguished the six stop consonants and six vowels by 
visual features of their spectral representation. These cues included striation 
(voicing), width (duration), dark spots (formants), dark band height (place of 
articulation), and dark band curving (coarticulation effects). The subjects 
were told how they could use these cues to distinguish the consonants and 
vowels. In the Model Alone condition, subjects were shown a table which 
distinguished the consonants and vowels according to articulatory features 
(listed in parentheses above), but verbal instructions did not relate these 
features to any visual spectrogram features. In the Separate Model and Cue 
condition, subjects received all of the information in the Model Alo.ie and 
Cue Alone Conditions, but this information was not related together in the 
verbal instructions. Finally, in the Integrated Model and Cue condition, all of 
the model and cue information was given and tied together in the verbal 
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instructions. After receiving these instructions, subjects were given the Real 
Word Learning experiment previously described. Subjects viewed a total of 
74 words. Their performance on the first 10 words and the last 10 words 
was measured. On the intervening problems, subjects had access to a help 
window which displayed the tables they had seen during instruction. The 
difference between their performance on the first 10 trials and the last 10 
trials was used as a measure of their improvement. 

Results . The mean number of phonemes correctly identified on the 
first 10 problems over all subjects was 4.53. Because subjects knew that 
there were only six possible responses for each of the three phonemes in a 
pattern, chance performance on a block of 10 trials was 5.0 phonemes. A t- 
test showed that this first block performance was not better than chance 
t(31)=1.49, p > .05; and none of the means for the four instructional 
conditions deviated significantly from the others (range was 4.12 to 5.0). The 
mean number of phonemes correctly identified on the last 10 problems over 
all subjects was 11.16. An analysis of variance was performed to compare 
whether the difference in first and last block performance varied with 
condition. The analysis found that although significant learning occurred 
between the first and last block. F(l,24)=40.96, p < .001. this improvement 
was equal for all instructional conditions. F(3.24)=0.98, p > 0.40. 

One other measure of interest was the number of times subjects in 
each condition used the help screen. The results showed that subjects in 
the Model Alone condition used the help screen the leasi, an average of 4.75 
times. Subjects in the Cue Alone and Integrated Model and Cue condition 
used the facility the same amount, an average of 8.78 and 8.75 times 
respectively. The subjects in the Separate Model and Cue condition used the 
help facility the most, an average of 10.38 times. These values may reflect 
how useful the subjects in these conditions thought the help information 
was, but this did not appear to affect their learning veiy much. 

The conclusion of this study was that no instructional effect was found 
for this task. The reasons are not clear, but it is likely that subjects did not 
adequately learn the instructional material and could not make use of it 
during practice. No effort was made to assess the extent of their learning of 
the instructional material, so this explanation is unverified. 
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Consonant Discrimination Learning Experiment I 

In the Real Word Learning experiment, it was observed that subjects 
had more difficulty learning consonants which had to be distinguished by a 
vowel feature (formant curvature). The first Consonant Discrimination 
Learning experiment was undertaken to test whether this was a real effect, or 
whether it was due to the unequal number of consonants in each of the 
learning blocks. The basic design of this experiment was the same as the 
Real Word Learning experiment; but subjects were given all C-V-C 
combinations of the consonants and vowels, and they were not told of any 
relationship between patterns and real words. Subjects responded by 
selecting consonant and vowel names from a menu rather than typing in the 
word. Feedback was provided on error trials. 

Method . Ten subjects were shown pseudo-spectrogram patterns of all 
CVC combinations of the consonants /b/. /p/, /d/, /g/, /t/, /k/ and 
vowels /!/, /e/, /ae/, /O/. /u/, /o/. This produced 216 patterns, which 
were shown over three to four sessions. The patterns were divided into 
blocks of twelve, so that each consonant appeared in prevocalic and 
postvocalic form twice, and each vowel appeared twice. The presentation of 
these blocks and the order of patterns within a block was randomized. 
Subjects were also questioned verbally about their hypotheses and intuitions 
about the task. The stimuli were drawn so that /b/ and /p/ appeared 
similar but could be distinguished by more than one feature (such as texture 
and shading): /t/ and /k/ appeared similar but could be distinguished by a 
single feature (number of dark spots inside their pattern); and /d/ and /g/ 
appeared identical but could be distinguished by the curving of the adjacent 
vowel’s formants (/g/ caused the formants to curve together). The block on 
which subjects learned to distinguish each of these three pairs was the main 
dependent variable. 

Results . Subjects were considered to have learned a pair if they 
responded correctly on four consecutive blocks with only one error. Of the 
10 subjects, 9 learned the /b/-/p/ distinction, 6 learned the /t/-/k/ 
distinction, and 2 learned the /d/-/g/ distinction. McNemar’s exact test for 
correlated proportions showed that significantly more people learned the /b/- 
/p/ distinction than learned the /d/-/g/ distinction (p < .02), but the test of 
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whether more people learned the /t/-/k/ distinction than learned the /d/- 
/g/ distinction was non-significant (p=.10). A matched pairs sign test was 
used to test which distinctions were learned earlier than the others. This 
test revealed that the /b/-/p/ and /t/-/k/ distinctions were learned earlier 
than the /d/-/g/ distinction (p < .01 and p < .02 respectively). 

These results appear to have verified the previous finding. It was more 
difficult to learn a discrimination if the critical feature is in another part (in 
a vowel in this case). However, it is not certain whether this effect is due to 
segmentation, the salience of the cues, or some other factor. The third 
Consonant Discrimination Learning experiment followed up this question. 

Consonant Discrimination Learning Experiment II 

The next Consonant Discrimination Learning experiment looked at 
whether the random noise added to the spectrogram patterns had any 
influence on the difficulty of learning the patterns. Presumably, if people are 
biased towards looking within a part for a feature which will identify it, then 
the presence of random noise will supply more hypotheses for them to 
consider than if the random noise were not present. The task in this 
experiment was simplified by using only the /d/-/g/ and /t/-/k/ consonant 
distinctions and only one consonant in each pattern. The presence of noise 
(random edging) was varied between subjects. 

Method . The patterns shown to subjects were all C-V combinations of 
the consonants /d/, /g/, /t/, /k/ and the vowels /i/, /o/. /ae/, /e/. The 
16 different patterns were shown 18 times for a total of 288 trials. In the 
no-noise condition, these patterns appeared with straight edges, in the noise 
condition, the lengths of the lines used to draw the pattern were set to a 
random number within about 6 mm from a set ending point. For both 
conditions, the problems were divided into blocks of four, where each 
consonant and vowel appeared once. The subjects responded separately to 
the consonant and vowel by selecting the symbol for each from a screen 
menu. The major dependent variable was the block on which a subject 
learned the /d/-/g/ and /t/-/k/ distinctions. Twelve subjects were run to 
obtain 4 full or partial learners in each condition. 
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Results . All four non-learners were in the noise condition. In the no¬ 
noise condition, three of the subjects learned the /t/-/k/ distinction before 
the /d/-/g/ distinction. In the noise condition, two subjects learned the /t/- 
/k/ distinction first, and two learned the /d/-/g/ distinction first. Not 
enough subjects were run to perform any statistical tests. The results do 
appear to suggest that the addition of the random noise made the task 
somewhat more difficult to learn. 

Consonant Discrimination Learning Experiment (Selection Task) 

Another question that occurred to us was whether the subjects learned 
the /d/-/g/ distinction last simply because it was more difficult, or whether 
they had to learn all of the other distinctions first to eliminate other features 
from consideration. Would we still find this same learning order if subjects 
could .elect which stimulus patterns they could see? To test this, we set up 
an experiment in which a subject responded to one block of trials in the 
same way as in the previous experiment, but then for the next block of trials 
could select which patterns to see by selecting the appropriate phonemes. 

Method . It was necessary to run only one subject on this mixed 
presentation/selection task. 

Results . The basic result is that the subject learned the /b/-/p/ 
distinction first, but then focused on the /d/-/g/ distinction and learned it 
before the /t/-/k/ distinction. 


Consonant Discrimination Learning Experiment III 

The final Consonant Discrimination Learning experiment tried to 
discover whether the learning difficulty associated with the vowel 
transformation cue was attributable to segmentation or some other factor 
such as salience. This experiment used a complex design to control for 
salience and task demands, but used the same task as the Noise condition 
in the second Consonant Discrimination experiment. 

Method . To control for any differences in cue salience, each type of 
cue, the formant curving cue (/d/-/g/ distinction) and the number of 
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formants cue (/t/-/k/ distinction) was presented both within the phoneme 
being learned and outside of it in another part. Because this could not be 
done using a within subjects design, an incomplete blocks design was used. 
A pair of subjects provided one observation for both cues presented within 
and outside of a part. Thus, any difference in salience between the two cues 
should equally affect within and between object discriminations. To control 
for any task demands which may be produced by associating different parts 
of the pattern with different responses, subjects made a single consonant 
response to the whole pattern and never made a separate response for 
vowels. However, half of the subject pairs were given instructions biasing 
them to look at either the consonant or vowel (whichever contained the 
within object cue). Trials were divided into 8 problem blocks with each 
consonant represented twice and each of four vowels represented once. The 
block on which a subject learned one of the consonant distinctions was the 
major dependent variable. 

Results . Subjects were considered to have learned a consonant 
distinction if they were correct on two consecutive blocks with one allowed 
error on the second block. Eighteen of the subjects learned both the within 
part distinction and the between part distinction. 13 learned only the within 
part distinction, 5 learned only the between part distinction, and 12 learned 
neither distinction and were not included in the analysis. Matched pairs 
sign tests were performed to determine which distinctions were more difficult. 
These tests revealed that the number of formants cue was more difficult to 
learn than the formant curving cue when the cues were between parts, but 
there was no difference between the two cues when they were within a part. 
This indicates that segmentation interacts with cue salience to produce 
learning difficulty. 

However, the pattern of these results did not reproduce those reported 
in the first Consonant Discrimination Learning experiment. This is most 
likely due to the change in the task. Subjects in the previous experiment 
responded to both consonants and vowels, but subjects in the present 
experiment only made a consonant response to the whole pattern. Subjects 
making the vowel response likely thought the formant curving was relevant to 
vowel identity and failed to use it to distinguish the consonants. When the 
necessity of making a vowel identification was removed, subjects could 
consider any feature relevant to the consonant identity. 
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The results of this experiment indicate that subjects may be biased 
towards searching within a part for its distinguishing features and that this 
bias may be enhanced when other task demands make use of any between 
part cues. 


Conclusions 

The studies performed, and other pilot efforts with similar outcomes, 
make it clear that a significantly different approach will be needed if progress 
is to be made on impasses in perceptual learning. We did try other 
approaches, including extensive taking of protocols and probing for 
hypotheses about what characterized various displays. However, we were not 
able to gain sufficient control over the generation of impasses to have them 
occur reliably, for most of our subjects, and over multiple experiments. Yet 
there were, along the way, striking examples of extended periods in which 
little or no learning took place. 

For example, in some of our studies that showed impasses, at least 
temporarily, we were able to fit individual subjects’ data with models that 
claimed performance to be constant at one level until it rose, rather quickly, 
to a second level. This type of model is relatively consistent with the Zeaman 
and House (1963) representation of learning as consisting of a period in 
which there is a search for relevant features followed by rapid learning of the 
mappings of those features onto categories. Figure 5 shows the data for one 
student on Consonant Discrimination Learning Experiment I. The problem 
was not that we never got such nice impasse patterns; rather it was that we 
never gained control over when they would appear. Indeed, the same 
experiment yielded protocols supporting the difficulty subjects had in noticing 
feature clusters that crossed meaningful unit (phoneme) boundaries. 

We conclude that the best available tools for studying impasses in 
learning are probably the tools used in comparative expertise ("expert-novice") 
research, rather than those of the learning study. That is, one must find 
natural situations in which impasses occur over periods of extended learning 
practice and carefully assess performance at benchmark points in the course 
of such apprenticeship. Independent of circumstances, the time one can 
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have to work with a research subject is always limited, and for the present 
purpose, it should be invested in understanding a current state of knowledge 
rather than trying to induce a new state that may take too long to appear. 
In a sense, then, the original radiological expertise studies may have been 
closer to the right approach than the work undertaken in the present project. 

We did demonstrate impasses, though, and our views of why they 
occur and how they might be overcome still seem reasonable. Specifically, 
impasses arise when the relevant features of a situation are not apparent. 
Because feature noticing is extremely well developed in humans, this problem 
generally arises only when (a) the features defining a category are tied, by the 
Gestalt rules and prior knowledge of the environment, more closely to 
features relevant to other domain tasks than to each other; (b) a mental 
model of how the displays come to look they way they do has not been 
acquired or is not mentally manipulable with facility; and (c) no advice (rules) 
on how to parse the display have been acquired. Some of the displays that 
arise in modem technological application have these characteristics. Further, 
because the display forms are designed by experts, no one may notice that 
they have the shortcomings just mentioned. 


Available Software and Data 

Longer reports of each of the experiments described above, including 
photocopies of the display screens, are available without charge to any 
researcher on the ONR cognitive science mailing list. Other researchers will 
be accommodated but may have to pay reproduction costs if supplies run 
out. Similarly, the Interlisp software to produce the stimuli and run the 
experiments is also available under the same terms. A technical report 
describing the last few studies is being issued simultaneously with this final 
report. Address all inquiries to Alan Lesgold, LRDC, University of Pittsburgh, 
Pittsburgh, PA 15260. 
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